2) Normalization and Denormalization - From the Absolute Basics to the Advanced Level
If you haven't read the PART I - Data Types in SQL, read here!
Normalization and Denormalization
Now, let’s break down Normalization and Denormalization from the absolute basics to the advanced level so you understand why, how, and when to use them (and when not to).
We’ll do this in three layers:
- Foundations: What they are & why they matter
- Step-by-step Normal Forms with examples
- Denormalization: When & how to break the rules
1. What is Normalization?
Definition:
Goals of normalization:
- Avoid duplicate data (data redundancy)
- Ensure data dependencies make sense
- Make data easier to maintain and update
2. What is Denormalization?
Definition:
- Normalization = Clean and efficient data structure
- Denormalization = Fast access, but can create duplicates
3. Example Setup
- Alice’s name & address repeated
- Product price stored in every order (if price changes, multiple rows need updating)
4. The Normal Forms (Beginner → Advanced)
1NF - First Normal Form
Rules:
- No repeating groups (columns should be atomic values)
- Each column should store a single value
Good (1NF)✅:
2NF — Second Normal Form
- Must be in 1NF
- No partial dependency (non-key columns must depend on the whole primary key)Fix in our example
Rules:
Must be in 2NF
No transitive dependency (non-key columns must depend only on primary key, not on another non-key column)
CustomerAddress is stored in the Orders table, it depends on CustomerID (which depends on OrderID indirectly).CustomerAddress only in Customers table.BCNF - Boyce-Codd Normal Form
Stricter than 3NF
- Every determinant (column that defines another) must be a candidate key
Higher Forms (Advanced):
- 4NF: No multi-valued dependencies
- 5NF: No join dependencies
- 6NF: Mostly theoretical, used in temporal databases
5. Denormalization in Practice
- We need faster read performance
- We’re okay with some duplication to avoid complex joins
Orders and Customers to get CustomerName,CustomerName in Orders for quick reporting.❌ More storage
❌ Risk of inconsistent data if updates are not synced
6. Summary Table
7. Normalization & Denormalization Practice Set
Now, let’s build a Normalization & Denormalization practice set that checks:
-
✅ Concept understanding (theory questions)
-
✅ Coding ability (SQL table design & queries)
-
✅ Practical decision-making (when to normalize/denormalize)
Easy (Concept Basics)
a) 1NF
b) 2NF
c) 3NF
d) BCNF
Answer: a) 1NF
Explanation:
1NF ensures each cell holds a single atomic value, removing repeating groups and arrays in a single column.
2. True/False:
In 3NF, every non-key column should depend only on the primary key.
Answer: True
Explanation:
3NF removes transitive dependencies so no non-key column depends on another non-key column.
3. Given this table:
Which normal form is violated?
Answer: 2NF
Explanation:
CustomerName is repeated for each order, should be moved to a Customers table. That’s a partial dependency on part of the key.
Medium (SQL Design & Fixing)
Explanation:
We separated student details and course details into different tables to avoid repetition.
5. Why is the following table not in 3NF?
Answer:
Because
DepartmentName depends on DepartmentID (a non-key), not directly on EmployeeID. This is a transitive dependency.
6. Create a table in BCNF for storing library book loans.
Answer:
Explanation:In BCNF, every determinant is a candidate key, no non-trivial functional dependencies on non-candidate keys.
Hard (Practical Decision Making)
7. Your e-commerce database needs to display order history very quickly on the dashboard. The normalized version requires 4 joins. What’s a practical denormalization strategy?
Answer:
Store customer name, product name, and price directly in the Orders table (even if duplicated) for quick retrieval.
Explanation:
This reduces join complexity and improves query speed, at the cost of redundancy.
8. You find that updating a customer’s address takes too long because it’s stored in 5 different tables. Which principle is being violated and which normalization form would fix it?
Answer:
Violation: Data redundancy → Update anomaly.
Fix: 3NF: Store the address in a single Customers table.
9. Design a denormalized SalesSummary table from these normalized tables for analytics:
Normalized:
Orders(OrderID, CustomerID, OrderDate)
Customers(CustomerID, CustomerName)
OrderDetails(OrderID, ProductID, Quantity)
Products(ProductID, ProductName, Price)
Explanation:
We flattened multiple joins into one table for reporting, faster reads, but duplicated CustomerName and ProductName.
10. Advanced challenge: Given the table below, identify the highest normal form it satisfies:
Answer:
2NF only.
Explanation:
-
1NF: ✅ Atomic values
-
2NF: ✅ No partial dependencies (EmpName → EmpID, ProjectName → ProjectID)
-
3NF: ❌
Managerdepends onProjectName(transitive dependency)
Next Part: Indexing in SQL


Comments
Post a Comment